optimal ratio
Optimal Ratio for Data Splitting
It is common to split a dataset into training and testing sets before fitting a statistical or machine learning model. However, there is no clear guidance on how much data should be used for training and testing. In this article we show that the optimal splitting ratio is $\sqrt{p}:1$, where $p$ is the number of parameters in a linear regression model that explains the data well.
Agriculture of the future: neural networks have learned to predict plant growth
Scientists from Skoltech have trained neural networks to evaluate and predict the plant growth pattern taking into account the main influencing factors and propose the optimal ratio between the nutrient requirements and other growth-driving parameters. The results of the study were published in the IEEE journal Transactions on Instrumentations and Measurements. Over the past few years, multiple attempts have been made to use artificial intelligence (AI) in nearly all spheres of life. It has proven useful, helping people to make the right decisions and achieve the goal. Using AI to grow plants in artificial environments is no exception.
Minimax Error of Interpolation and Optimal Design of Experiments for Variable Fidelity Data
Zaytsev, Alexey, Burnaev, Evgeny
Engineering problems often involve data sources of variable fidelity with different costs of obtaining an observation. In particular, one can use both a cheap low fidelity function (e.g. a computational experiment with a CFD code) and an expensive high fidelity function (e.g. a wind tunnel experiment) to generate a data sample in order to construct a regression model of a high fidelity function. The key question in this setting is how the sizes of the high and low fidelity data samples should be selected in order to stay within a given computational budget and maximize accuracy of the regression model prior to committing resources on data acquisition. In this paper we obtain minimax interpolation errors for single and variable fidelity scenarios for a multivariate Gaussian process regression. Evaluation of the minimax errors allows us to identify cases when the variable fidelity data provides better interpolation accuracy than the exclusively high fidelity data for the same computational budget. These results allow us to calculate the optimal shares of variable fidelity data samples under the given computational budget constraint. Real and synthetic data experiments suggest that using the obtained optimal shares often outperforms natural heuristics in terms of the regression accuracy.